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ASYMPTOTICALLY OPTIMAL MULTISTAGE TESTS OF 
SIMPLE HYPOTHESES 

By Jay Bartroff 1 

University of Southern California 

A family of variable stage size multistage tests of simple hypothe- 
ses is described, based on efficient multistage sampling procedures. 
Using a loss function that is a linear combination of sampling costs 
and error probabilities, these tests are shown to minimize the inte- 
grated risk to second order as the costs per stage and per observation 
approach zero. A numerical study shows significant improvement over 
group sequential tests in a binomial testing problem. 

1. Introduction and summary. Multistage hypothesis tests have practi- 
cal advantages over fully-sequential tests in many situations since it is often 
more costly to perform n single experiments than a single experiment of 
size n. The theory of efficient multistage tests has been developed in essen- 
tially two directions. The first is general existence and uniqueness results of 
Schmitz [20], who shows that optimal multistage procedures do exist for a 
large class of problems and that the optimum has the renewal-type prop- 
erty that at each stage it behaves as if it were starting from scratch given 
the data so far, and Morgan and Cressie [5, 18], who prove the existence 
of a multistage competitor of the SPRT. However, these general results do 
not tell us anything more specific about the optimal tests and certainly not 
how to apply them without resorting to backward induction-type computer 
algorithms or artificial truncations. The second direction is truncated (pre- 
determined number of stages) and group sequential (constant stage size) 
tests, of which many have been developed for clinical trials; see Pocock [19], 
Wang and Tsiatis [21], Kim and DeMets [12], Eales and Jennison [7, 8], 
Jennison and Turnbull [11], Barber and Jennison [1] and Lai and Shih [13]. 
These authors do provide specific tests that successfully address many prac- 
tical issues arising in clinical trials, but are not concerned with optimality 
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in a general setting, and those that do prove optimality do so under se- 
vere restrictions of truncation or constant stage sizes. Lorden [17] presents 
a three-stage test that has asymptotically the same total sample size as the 
SPRT and shows that three stages are necessary for any multistage test to 
have this property. 

These previous results do not address a fundamental question in multi- 
stage testing: How does one choose the size of the next stage optimally, given 
the data observed so far and free of oversimplifying restrictions? This paper 
aims to answer this question by introducing a family of variable stage size 
multistage tests which can be described by simple, closed-form equations 
and are asymptotically optimal, without relying on truncations or group se- 
quential restrictions. We focus here on testing simple hypotheses; extension 
of these ideas to composite hypotheses is discussed in the author's Ph.D. 
thesis [2]. 

A common theme in sequential testing is that testing hypotheses can often 
be reduced to a "power one" test, that is, a test that stops sampling as soon 
as there is sufficient evidence that the null hypothesis is true but is content 
to continue sampling forever if it appears that the alternative hypothesis 
is true. For example, in the fully-sequential setting, Lorden [15, 16] shows 
that once a substantial number of observations have been taken, asymptotic 
optimality considerations for testing simple hypotheses can be reduced to 
considering only power one tests involving the estimated true state of na- 
ture versus the opposing hypothesis. Moreover, finding an optimal power 
one test typically reduces to solving a boundary crossing problem for the 
relevant test statistic. This suggests the following informal hierarchy: 

Test of simple hypotheses 
reduces to 
Power one test 
reduces to 
Boundary crossing problem. 

In order to derive optimal multistage tests, we consider these three prob- 
lems in reverse order. In Section 2 we present asymptotically optimal multi- 
stage samplers, procedures that sample a random process in stages until it 
crosses a predetermined boundary. This problem was considered for Brow- 
nian motion by Bartroff [3] and we extend those results here to i.i.d., non- 
normal data. In Section 3 we use the optimal multistage samplers to design 
efficient power one tests. In Section 4 we use combinations of these power 
one tests to design efficient hypothesis tests. Here efficiency is measured by 
a linear combination of expected sample size, expected number of stages and 
error probabilities. Our tests are shown to be second order optimal as the 
costs per stage and per observation approach zero, which corresponds to a 
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large sample size. In marked contrast to constant stage-size group sequential 
tests, the asymptotically optimal tests and samplers presented here neces- 
sarily have stage sizes that decrease roughly as successive iterations of the 
function x t— > yj x log x with probability close to 1, while the average number 
of stages used is determined by the asymptotics of the ratio of the cost per 
stage to cost per observation. In Section 5 we propose a finite-sample proce- 
dure and present the results of a simulation study comparing it with group 
sequential tests of hypotheses about the probability of success of Bernoulli 
trials. The variable stage size tests show substantial improvement over the 
constant stage size tests. 

2. Multistage samplers. Consider sampling X\,X2,--- in stages until 
Xi > a > at the end of a stage, and in such a way as to minimize 



where N, M are the total sample size and number of stages used. Here c, d > 
represent the costs per observation and per stage, so the sum (2.1) is the 
average cost incurred in crossing the boundary. On one hand, taking a large 
number of small stages would make c • EN small but d ■ EM large; on the 
other hand, taking a small number of large stages would make c • EN large 
but d- EM small. Thus, the sampler that minimizes (2.1) can be thought of 
as the optimal compromise between these two extreme sampling strategies. 
In this section, after some necessary preliminaries, we define a multistage 
sampling strategy, show in Theorem 2.1 that it asymptotically minimizes 
this sampling cost, and show conversely in Theorem 2.2 that any efficient 
sampler must behave similarly; all theorems are proved in the Appendix. 
This sampler will be used to construct efficient multistage tests in Sections 3 
and 4. 

Assume that X, X±,X2, ■ ■ ■ are i.i.d. We say that X is strongly nonlattice 
if the characteristic function v(t) of X satisfies 



for some rj > 0. We assume that one of the following three conditions holds: 

(2.3) The distribution of X is strongly nonlattice and EX 4 < oo. 

(2.4) The distribution of X is lattice and EX 4 < oo. 

(2.5) There is an H > such that Ee tx < oo for |i| < H. 

These conditions are what is needed for the necessary sharp large deviation 
estimates; see Lemma A.l. We essentially require X to have a finite fourth 
moment plus to be lattice or strongly nonlattice [(2.3)— (2.4)]. However, if 




c-EN + d- EM, 



(2.2) 
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this doesn't hold, then our results are still valid if the moment generating 
function is finite in a neighborhood of the origin [(2.5)]. Assume that [i = 
EX > 0. Since the problem is not changed by multiplying the Xj and the 
boundary a > by a positive constant, we assume without loss of generality 
that VarX = 1. 

We will describe the stage sizes of a multistage sampler by a sequence of 
nonnegative integer- valued random variables N = (N\,N2, • • •) such that 

(2.6) iV fe+ i • l{Ni + ■ ■ • + N k = n} G £ n foralln>l, 

where £ n is the class of all random variables determined by X\ , . . . , X n . 
The interpretation of the measurability requirement (2.6) is that by the 
time N k = N\ + ■ ■ ■ + Nk, the end of the first k stages, an observer who 
knows the values X±, . . . ,X N k also knows Nk+i, the size of the (k + l)st 
stage. We also let N denote the total sample size N M , where M = inf{?n > 

1 : X\ H h X^m > a}, the total number of stages. A multistage sampler is 

a pair 5(x) = (N, M), where the argument x > is the initial distance to the 
boundary. When there is no confusion as to which sampler is being used, we 

will write Sk = X\ H h X N k , Sq = 0. 

After dividing (2.1) through by c, minimizing (2.1) is seen to be equivalent 
to minimizing 

(2.7) EN + h- EM, 
where h = d/c. By Wald's equation, 

(2.8) EN = ESm/h = a/n + E{S M - a)/ fx > a/ ft, 
so the sampler that minimizes 

(2.9) E(N-a/n) + h-EM 

also minimizes (2.7). Also, using (2.9) instead of (2.7) will lead to a more 
refined "first-order" asymptotic theory. 

The problem of describing the sampler that asymptotically minimizes 

(2.9) to first-order essentially reduces to considering only certain classes of 
sequences {(a,h)}, defined with respect to the critical functions 

(2.10) h m {x)=x^ /2)m {\ogx) l / 2 -^ m formal, h (x)=x. 

To describe a sampler that asymptotically minimizes (2.9) to first-order, it 
suffices to consider sequences {(a, h)} such that a — ► oo. Letting "<C" denote 
asymptotically of smaller order, it will turn out that good samplers use m 
stages (with probability approaching 1) if {(a,h)} satisfies 



(2.11) 



h m (a) <C h <C h m -i{a) 
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as a — > oo and use m or m + 1 stages (with probability approaching 1) if 
{(a, h)} satisfies 

(2.12) i im _^ G ( ,oo). 

A sequence {(a,h)} satisfying (2.11) is said to be in the mth critical band, 
while one satisfying (2.12) is said to be on the boundary between critical 
bands m and m + 1. Since it will prove convenient to treat h as a function 
of a, we thus consider (2.9) with h replaced by a function h(a) such that 
{(a,h(a))} is either in the mth critical band or on the boundary between 
critical bands m and m + 1 (for every sequence of a's approaching oo). That 
is, let 

B° m = {h: (0, oo) -► (0, oo)|/t m < h < /i m _i}, 
£+ = { /i:(0, oo)^(0,oo) lim h(x)/h m (x)e(0 1 oo] 



B — B° [)B + 

and assume /i G S m for some m > 1. Our notation reflects that, as a — ► 00, 
the average number of stages of an efficient sampler approaches 



m ■ 



m ifheB, 
m + e if /i G 

where e G (0,1) is a function of lim^oo h(x)/h m (x); Figure 1 summarizes 
this relationship. We define the risk of a sampler 5(a) = (N,M) to be 

(2.13) R h (S(a))=E(N -a/n) + h(a)EM. 

Note that, by (2.8), the definition of risk (2.13) is equivalent to the expec- 
tation of a linear combination of the overshoot Sm — « and the number of 
stages used. Define the Bayes sampler 5* = (N* ,M*) to be one that achieves 
R* h (a) = mf s R h (S(a)). 

For x > and z G R, let t = t(x, z) be the unique solution of (x — [it) /y/i = 
z, that is, 



-,2 

(2.14) t(x,z)=x/n- 



',y/ Ax /J, + Z" - 



2^ 

by some simple algebra. Let z p be the upper p-quantile of the standard nor- 
mal distribution. If the Xi are i.i.d. N(/i, 1) and t(x, z p ) = n is an integer, 
then the probability that X\ H + X n exceeds x is p. This holds approx- 
imately when the Xi are not normal by large deviations and this is why 
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t is useful in parameterizing stage sizes. Let and (j> denote the standard 
normal distribution function and density. Let 

(2.15) u m {z) =m + *(*) + • Y^-y 

where ip + {z) = 4>(z) — <3?(— z)z was defined by Chernoff [4]. We extend the do- 
main of u m to [—00,00) by adopting the convention u m (— 00) = 
lim z ^_oo u m (z) =m. The function u m appears in the second-order term 
of the Bayes risk; see Theorem 2.1. 

Before defining the asymptotically optimal samplers 5^ h and z , we 

define an auxiliary sampler 5 n that will be used for the final stages of b° m h 
and 5+ z . For n € N, 5 n samples a first stage of size n, followed (if necessary) 
by stages of constant size [ji 1 / 2 ] . It is shown in Lemma A. 2 in the Appendix 
that n = n(a) — ► 00 can be chosen so that the overshoot of 5 n is not too large 
but its expected number of stages approaches 1 as a — > 00; for this reason, 
we refer to 5 n as bold sampling. 

Finally, we define the samplers 6^ h and 5+ z , which are shown to be 
asymptotically optimal below under different conditions. Namely, the sam- 
pler b~° m h will be optimal when h G B° m and 6^ z will be optimal when 
h G i3+. These samplers are extensions to nonnormal i.i.d. data of the sam- 
plers of Bartroff [3] for Brownian motion. Let n(x,z) = \t(x,z)~\ and f(x) = 
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(4/y / /I)\/a;log(x + 1). Note that f~ l is well defined since / is increasing. The 
samplers 5^ h {x) are indexed by a positive integer m and a positive function 
h, and the argument x is the initial distance to the boundary. Define 5^ h 
inductively on m as 

S °,h( x ) = L(x,C(x))(x), wh ere ((x) = -{\jh{x)/x A ^(3/2) log(x + 1)), 



= lst sta e e n ( x > V i 1 ~ 2 ~ m ) + *)) 
followed (if necessary) by 5 ( ^ nho j-i(x — Si). 

The samplers 5+ z (x), indexed by a positive integer m and a number z £ R, 
are defined inductively on m as 

(5^ z (x) = lst stage n(x,z), followed (if necessary) by 

K(x-Si)( x - ^i)' wher e v{y) =n(y,-^log(y+ 1)), 



= lst sta g e n ( x ' aA 1 " 2_m ) log ( x + 

followed (if necessary) by S^ n z (x — Si). 

Theorem 2.1. Assume h £ B m . Let z* £ [—00,00) be the unique solu- 
tion of 

d>(z*) , K m h m (x) 

(2.16) ^ V / = lim m T 7; 7 , 
v ; l-$(z*) «-oo /t(x) 

m—l 

(2.17) K m = /.- 2+ ( 1 /2r TJ [(1/2)™" 1 - - (l^r- 1 ]^)^ 1 . 

i=i 

T/ien R* h (a) ~ u m {z*)h(a) as a — ► 00. 1/ 

<5+ «, ifheB+, 
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then as a — > 00, 

(2.18) ^(5(a))~n m (z*)/i(a). 

Theorem 2.2 provides a converse to Theorem 2.1, showing that the type 
of sampling used by 5^ ft and 5+ 2 is necessary for any efficient procedure. 
Let F y (x) = \J x log(y/y 2 ) and for a function h and fc£N define 

F i h k \x)=F< j k \x)\ y=h[x) , 
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where the superscript (k) on the right-hand side denotes the feth iterate. 

Bartroff ([3], Lemma 8) showed that F^ k \a) is the order of magnitude of 
how far <5^ h and 6^ z are from the boundary (with probability approaching 
1) after the kth stage. Theorem 2.2 shows that any sampler that does not 
follow this "schedule" is necessarily suboptimal. 

Theorem 2.2. Assume that h G B m and let 
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6+ ifh£B+, 



where z* is as in (2.16). If 5' = (N,M) is a sampler such that there is a 
sequence at — > oo with 

(2.19) P(ai-S k > (l-e)(l/fi) 1 ~ 2 ~ k Fj l k) (ai)) bounded below 1 
for some 1 < k < m and e > 0, then 

(2.20) —— >+oo 

h{ai) 



as i 



oo. In particular, (2.20) holds if P{M > m) -f^ 1. 



3. Power one tests. Consider the problem of deciding between two den- 
sities fo and /i by sampling data in stages. Suppose that if fo is the true 
density, sampling costs are high and so we want to stop sampling as soon as 
possible and reject the hypothesis f\. On the other hand, if f\ is the true 
density, suppose that sampling costs nothing and we are content to observe 
the data ad infinitum. As an example, suppose a new drug is being mar- 
keted under the hypothesis that its side effects are insignificant. Physicians 
prescribing the drug record and report on the side effects and if they appear 
unacceptably high (fo), this must be announced and the drug withdrawn 
from use. But as long as the hypothesis of insignificant side effects (f\) re- 
mains tenable, no action is required. Although this is an idealized example, 
power one tests are important theoretical tools because we will use combi- 
nations of them to derive optimal hypothesis tests; see Section 1 and the 
paragraph preceding Section 4.1. 

Let Xi,X2, ... be i.i.d. with density either fo or /i, two distinct densities 
with respect to some nondegenerate cr-finite measure. Define a power one 
test of /o versus f\ to be a pair 5 = (N,M) such that N = (N±, N2, ■ ■ ■) 
is a sequence of nonnegative integer-valued random variables satisfying the 
measurability requirement (2.6), with N^, N k and M defined as in Section 2. 
Note that a "power one test of /o versus /1" may only reject f\. If one pays 
costs per observation and per stage under /o, plus a cost for terminating 
sampling under fi, then a natural measure of the performance of a power 
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one test of /o versus /i is the expected sum of these costs. Hence, we define 
the risk of a power one test S = (iV, M) of /o versus f\ to be 

(3.1) R c ,d( S ) = cE o N + dE o M + Pi(N < oo), 

where c, d > 0. Let 5* = (N*,M*) be a Bayes test which achieves risk R* d = 
ini s R c ,d(S). 

In this section we define a family of power one tests and show in Theo- 
rem 3.1 that they minimize the risk to second-order as c,d^0. Obviously 
the risk (3.1) depends on the rates at which c and d approach 0, much in the 
same way that in Section 2 the efficiency of a multistage sampler depended 
on the asymptotic properties of the function h, representing the ratio of the 
cost per stage to the cost per observation, with respect to the critical func- 
tions (2.10). It will turn out that the behavior of efficient hypothesis tests 
will be determined by an analogous relationship, but with d/c in place of h 
and a multiple of logd -1 in place of the boundary a in (2.11) and (2.12). 
That is, it will turn out that efficient hypothesis tests use m stages (with 
probability approaching 1) if c, d — ► in such a way that 

(3.2) h m (log cT 1 ) < d/c « h m ~ i (log d~ 1 ) 

and will use m or m + 1 stages (with probability approaching 1) if 

(3.3) lim - f /C G (0, oo). 

By analogy with Section 2, we give an essentially complete description of the 
problem while assuming c, d— > at rates satisfying (3.2) or (3.3). To update 
our notation, let be the set of all sequences {(c, d)} such that 1 > c, d — > 
and satisfying (3.2), let be the set of all such sequences satisfying (3.3), 
and let B m = U 23+ . We prove our main asymptotic results below for 
sequences {(c, d)} £ B m for some m > 1. Note that {(c,d)} E B m implies 
/i m (logd _1 ) = 0(d/c); hence, a consequence of this assumption is that d/c^ 
oo. If it were that d/c were bounded below oo, it can be shown that a test 
with constant stage size and number of stages approaching oo minimizes the 
risk (3.1) to second-order. Since our main interest here is variable stage size 
tests with a small number of stages, we can be sure that the assumption 
{(c, d)} £ B m does not exclude any interesting cases. 

In this section we use the multistage samplers of Section 2 as power 
one tests by sampling the log-likelihood process log(/o(Xj)//i(Xj)) until 
21og(/opQ) / fi(Xi)) exceeds a predetermined boundary. Let 

a a =Var log(/ (Xi)//i(Xi)), 

(3-4) 

Y i = a- 1 log(MX l )/f l (X l )), 
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so that E Yi = a^I > and Var li = 1, where I = £ log(/o(Xi)//i(^i)) 
is the Kullback-Leibler information number. Whenever we use a multistage 
sampler as a power one test in what follows, we mean with respect to 
Yi,Y2,..., which we assume satisfy one of (2.3)— (2.5). Our main result in 
this section is that the asymptotically optimal multistage samplers derived 
in Section 2 are second-order optimal as power one tests. 

Theorem 3.1. Assume that the Yi satisfy one of (2.3)-(2.5), {(c,d)} G 
B m , and let z* G [—00,00) be the unique solution of 

$(z*) _ K m (a~ l Io)h m (a~ l \ogd^ 1 ) 



1 - $(z*) c,d^o d/c 

where k m (fi) is as in (2.17). Then R* cd = cl^ 1 log d~ l + u m {z*)d + o(d) as 
c,d^0. If 5 is the power one test 

* = R,«*/>~ llo gO, zf{(c,d)}eB° m , 
[s^ia-Hogd- 1 ), if{(c,d)}eB+, 

then as c,d^ 0, 

(3.6) R c4 {5) = cIq 1 log cT 1 + u m (z*)d + o{d). 

4. Tests of simple hypotheses. In this section we use the optimal power 
one tests from the previous section to derive optimal multistage tests of two 
simple hypotheses. Consider the problem of deciding between two distinct 
densities /q and f\ by sampling the i.i.d. Xi,X 2 , ... in stages, while incurring 
a cost per observation c, a cost per stage d and a penalty Wi for incorrectly 
rejecting /j. Specifically, a test of the hypotheses H : f versus Hi : f\ is a 
triple 5 = (N, M, D), where N, M are as in Section 3 and D is the "decision" 
variable taking values in {0, 1}. The event {D = 1} means rejection of i^i-j. 
Define the integrated risk of a test 5 = (N, M, D) with respect to the prior 
7r to be 

1 

r c ,d(8) =Y.^l cE i N + dE i M + WiPi(D = 1 — i)], 

i=0 

where 7Ti,c,d,Wi > 0. Let 5* = (N* , M* , D*) denote a Bayes test, one that 
achieves integrated risk r* d = inf<5 r c ^(S). In this section we define a family 
of tests and show in Theorems 4.1 and 4.2 that they minimize the integrated 
risk to second-order as c, d — > 0. Moreover, the proofs of these results in the 
Appendix show that the integrated risk of efficient procedures is dominated 
by sampling and staging costs; hence, this Bayesian setup can be thought of 
as a stepping stone to finding tests that are efficient in the frequentist sense 
as well. 
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As in Section 3, we assume that c,d^0 at rates such that {(c,d)} € 
B m for some m > 1. For i = 0,1, let of = Varjlog(/j(Xi)//i_j(A A i)) and 

= ar 1 logC/iCXjO/Zi-iCX,-)) for i = 1,2, ... so that Etff = a^l, and 

Var^Y^ = 1, where Ii = £ , jlog(/i(Xi)//i_j(Xi)). Whenever we speak of a 
power one test of fi versus (i.e., a test which can only reject H^i : f\-i) 

(i) (i) 

below, we will always mean the one defined with respect to Y-y , Kj , . . . , 
which we assume satisfy one of (2.3)-(2.5). Let = nr=i(/o(^j)//i(^i)) 
denote the likelihood ratio, and when there is no confusion which ./V we are 
considering, we will let l k = l N k . 

To describe the family of optimal tests, we must consider separately two 
cases of the relationship between /q and f±. The first case, considered in 
Section 4.1, is when Iq = I\ and Varo Xj = Vari X{. This is the "symmet- 
ric" case in the sense that the two corresponding power one tests dictate 
the same initial stage size, and hence, their first stages can be applied si- 
multaneously. This case is of interest because it contains, most notably, the 
Normal mean problem, Hq : [i = /io versus H\ :/x = //i, about the mean /i of 
Normal random variables with known variance, and the symmetric Bino- 
mial case, Hq :p = 1/2 — A versus Hi :p = 1/2 + A, about the probability 
p of success of a Bernoulli trial. If Iq ^ I±, the nature of a Bayes test is 
fundamentally different. In this case, considered in Section 4.2, the ratio of 
the two initial stages given by the power one tests does not tend to 1 , and it 
is not obvious what the size of the initial stage should be. This gives rise to 
a necessary "exploratory" first stage, equal to the smaller of the two initial 
stages dictated by the two corresponding power one tests. The remaining 
case, where Iq = I\ and VaroXj ^ VariJQ, is at present unsolved, but the 
popular examples contained in the former and the generality of the latter 
make our analysis sufficient for most purposes. 

For simplicity, we present our results here for tests of two simple hypothe- 
ses, but these methods and results generalize immediately to tests of s > 2 
simple hypotheses. The asymptotically optimal test for s > 2 or for either 
subcase considered below for s = 2 may be loosely described as follows: Sam- 
ple at the first stage the size of the smallest first stage of the corresponding 
s(s — 1) power one tests, then continue sampling with the power one test of 
the most likely hypothesis versus the second most likely, according to the 
results of the first stage. 

4.1. Case I: Iq = h and Var Xi = VanJQ. Let (N^ \M^) be the 
power one test of /o versus f\ defined in Theorem 3.1 and let (N^ 1 ' , M^ l > ) 
be the corresponding power one test of fi versus Jq. Under the assump- 
tions Iq = I\ and VaroXj = \ai\Xi, the two procedures (N^°\ M^) and 
(iVW,MW) dictate the same first stage size. Define the first stage of 5 = 

(TV, M, D) to be this common first stage size, Ni = n[ 0) = n[ 1) . IU Nl >l, 
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continue with (N^°\M^ '), stopping the first time l N k > d~ l to reject H±, 
as dictated by (N^°\M^), or l N k < d to reject Hq. Otherwise, In x < 1, so 
switch and continue sampling with (N^ , M^- 1 ') with the same stopping rule. 
This test is second-order asymptotically optimal, recorded as Theorem 4.1. 

Theorem 4.1. // the satisfy one of (2.3)-(2.5), I = h, Var Xi = 
Vari Xi, and {(c, d)} £ B m , then 

(4.1) r* cd = cIq 1 \ogd~ 1 + u m (z*)d + o(d), 

(4.2) r c>d (S) = cIq 1 log cT 1 + u m (z*)d + o{d) 
as c, d— >0, where z* 6 [—00,00) is the unique solution of 

<P( Z *) _ lim K m(ff " 1 h)h m {o-Q 1 log d~ x ) 



1 - 3>(z*) c,d^o d/c 

4.2. Case II: Iq^ I\. Assume !§<I\. For i = 0, 1 let 5i(c,d,z) denote 
the power one test of fi versus f\-i (i.e., the test that can only reject fi-i) 
defined in Theorem 3.1 with generic parameters c,d,z. Given {(c,d)} £ B m , 
define Zq,z\ to be the unique solutions of the equations 

(43) _H4)_ = lim NmWM^Il-V^^O 



1 - ${zq) c,d^o d/c 
(44) _jK£i}_= ii m Kmio-^Ii^io-^iogd- 1 ] 



1 - ®{zl) c,d^o d/c 
Define 5 = (N,M,D) as follows: Let the first stage of 5 equal the 
1st stage of 5i(c,d,zl) = min{lst stage of <5j(c, d, z*)}. 

i 

After the first stage, 

if I 1 < 1, continue sampling with <5i(c, d, zl), 

if I > 1, switch and continue sampling with 5${l l c, Id, z$), 

with the stopping rule 

(4.5) stop after the feth stage and reject Ho if l k < d, 

(4.6) stop after the feth stage and reject H\ if l k > d" 1 . 

Note that 5 stops no later than whichever power one test it chooses 
after the first stage since <5i(c, d, z*) stops when J2i Yj > a \ l log cf" 1 , 
which is equivalent to (4.5), while 5q{1 1 c, l l d, z$) stops when EiVi+i^} > 
Oq 1 log(/ 1 d) _1 , which is equivalent to (4.6). However, 5 may stop before 
the corresponding power one test because of the stopping rule (4.5)-(4.6). 
Theorem 4.2 establishes the second-order optimality of 5. 
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Theorem 4.2. // the Yf> satisfy one of (2.3)-(2.5), I < I 1} and 
{(c, d)} G B m , then 

l 

(4-7) r* 4 = ^{c/r 1 logd- 1 +d[l-i + u m (z*)]} + o(d), 

i=0 
1 

(4.8) r C)d (6) = Y, Mdi 1 logrf" 1 +d[l-i + u m (z*)]} + o(d) 

i=0 

as c, d—>0, where z* is given by (4-3) and (4-4) ■ 

5. A numerical example. The tests proved asymptotically optimal in 
Theorems 4.1 and 4.2 are asymptotic not only in the sense that their opti- 
mality is proved in the limit as c,d— > 0, but also in that they are defined 
in terms of the rates at which c, d— > 0. Thus, in practice, there may be 
more than one asymptotically optimal procedure for a statistician to choose 
from. In this section we describe one such procedure and give the results of 
a numerical experiment comparing it to a sampling with constant stage size. 

Given values < c, d < 1 , let 

(5.1) m* =inf{m > l:K m (fjL i )h m (a i + 1) - n m+ i(m)h m+ i(ai + 1) < d/c} 

for i = 0,1, where fa = Ii/o~i and aj = a^ 1 logd" 1 . Let 5 be the test 
whose first stage is the smaller of the two first stages of the samplers 
^m* d/d^ 1 1°S^ _1 )> an d then continues sampling according to 

C^/cK'logO if Z 1 > 1, 

(5.2) 

C^/cK'logO if^<l. 

The test 5 is asymptotically optimal by Theorem 4.1 when c, d— > such 
that {(c, d)} £ B° m since, clearly, m* will equal m for sufficiently small c,d. 

We consider testing the hypotheses Hq :p = 0.4 versus H\:p = 0.6 about 
the probability p of success of i.i.d. Bernoulli trials. To isolate the effects of 
using variable stage sizes, we compare <5 with the test 5k that uses stage sizes 
of constant size k but with the same stopping rule (4.5)-(4.6), that is, stop 
when the log- likelihood exceeds logd -1 in absolute value. Table 1 contains 
the expected sample size, expected number of stages and integrated risk of 
5 and 5^ for various k, c and d, each of which is computed by 100,000 Monte 
Carlo replications. For each value of d/c, the operating characteristics of 6). 
are given in Table 1 for the following five values of /c: k = 1 (fully-sequential 
sampling), the (rounded) "average stage size" EN /EM of 5, the size of 
the first stage of S, the (rounded) expected sample size EN of 5 and the 
optimal value k = k* minimizing r c( j(Sk), found by exhaustion. Here E(-) 
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Table 1 

Expected sample size, number of stages, integrated risk and 2nd order risk of 8 and St for 
the binomial testing problem p = 0.4 vs. p = 0.6 with logd -1 = 10, 7i"i = l/2, Wi = 1 



Test 


EN 




EM 


r c ,d/d 


r' c , d /d 


1 - r' a>d (6)/r' Ctd 










d/c = 5 










777 * 

% 


= 7.0 


TV rl /d = 31 7 






§ 


125.1 




5.9 


30.9 


5.33 




"1 


124.9 




124.9 


151.0 


125.43 


95.8% 


3 21 


141.3 




6.7 


35.4 


9.83 


45.8% 




163.3 




2.8 


35.9 


10.33 


48.3% 


^125 


189.0 




1.5 


39.7 


14.13 


62.2% 


<5fe*=43 


154.2 




3.6 


34.4 


8.83 


39.6% 










d/c= 10 










m* 


= 5.0 


f c , d /d=17.3 






8 


130.9 




4.3 


17.4 


4.07 




6i 


125.1 




125.1 


138.5 


125.17 


96.7% 


830 


149.3 




5.0 


20.0 


6.67 


39.0% 


870 


171.2 




2.4 


20.1 


6.77 


39.9% 


5l30 


195.4 




1.5 


21.0 


7.67 


46.9% 


<5fe*=49 


157.7 




3.2 


19.0 


5.67 


28.2% 










d/c= 25 










m* 


= 2.0 


r c ,d/d = 6.9 






5 


144.4 




2.6 


8.36 


2.43 




Si 


124.9 




124.9 


131.0 


125.07 


98.1% 


5se 


163.8 




2.9 


9.92 


3.99 


39.1% 


8&g 


176.8 




2.0 


9.06 


3.13 


22.4% 


(5l44 


201.0 




1.4 


9.66 


3.73 


34.9% 


5fe*=95 


178.8 




1.9 


9.04 


3.11 


21.9% 



denotes X)i=o 7r i-^'i(')- Since both 5 and 5^ sample until the absolute value 
of the log-likelihood ratio exceeds \ogd~ 1 , the cost of the average number of 
observations required to do this and the cost of the first stage represent "fixed 
costs," which it is shown in Lemma A.4 in the Appendix that any efficient 
test must incur. We obtain a more accurate comparison of the efficiency due 
to variable stage size sampling by considering the second- order risk r' cd = 

r c ,d — (cEN^ +d), where A^ 1 ) is the sample size of 5k=i - The fifth column of 
Table 1 contains the second-order risk and its percent decreases by 5 in the 
sixth column. Also included in Table 1 are the asymptotic approximations 
m* [given by (5.1)] of the optimal expected number of stages and r c ^ = 
clogd -1 / / + m*d of the Bayes integrated risk. 

The results show that 5 has substantially smaller risk and second-order 
risk than the 5k- Since 5 and 5t use the same stopping rule, this is due to 
the variable stage sizes of 5, versus the constant stage sizes of 6k- Even when 
compared to 5k* with the optimal fixed stage size k* (which requires fitting 
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an additional parameter), 5 has roughly 40%, 30% and 20% smaller second- 
order risk for d/c = 5, 10 and 25, respectively. The degree of improvement 
decreases for larger values of d/c as is anticipated since the expected number 
of stages of any reasonable test approaches 1 in this limit. Note that for 
each value of d/c, the expected number of stages of 5 is larger than that 
of 5k*, while the expected sample size is smaller. Thus, the way 5 varies its 
stage sizes allows it to have more interim looks (stages), while keeping its 
overshoot, and hence, expected sample size, small. The expected number 
of stages and integrated risk of 5 are close to their approximations m* and 
r c ,d- Here the test 5 was constructed from the samplers 5° m d , c . However, tests 

designed from the samplers 5^ z also perform well in practice and behave 
almost identically to those constructed from the samplers 5° m d ^ c . 

A natural question to ask is what values of c, d should be used in practice 
if one is not comfortable specifying them as "costs" ? The theory of the tests 
in Section 4 yields that logc^ 1 // is an asymptotic approximation of the 
expected sample size and that d is an asymptotic upper bound on the type 
I and II error probabilities. Hence, one could first choose d to be the desired 
error probability or so that logd^/I is an acceptable expected sample size, 
and then choose c so that m* is an acceptable expected number of stages, 
using (5.1). 

APPENDIX 

A.l. Proof of Theorems 2.1 and 2.2. As mentioned above, the samplers 
5^ h and <5+ z are extensions of Bartroff 's [3] samplers for Brownian mo- 
tion, and otherwise only differ slightly in their final stages. Moreover, The- 
orems 2.1 and 2.2 are extensions of Theorems 2.3 and 2.4 of Bartroff [3], 
requiring only two additional tools: first, justification for replacing the ex- 
pected overshoot E(J2 Ai — a;J2-^i — a ) by that of the normal distribution; 
second, bounds on the operating characteristics of the bold sampling 5 n used 
in the final stages. With these two tools, the proofs of the corresponding the- 
orems in Bartroff [3] can be followed almost exactly. We therefore state and 
prove these two needed tools here as Lemmas A.l and A. 2 and refer the 
reader to Bartroff [3] for the rest of the proof of Theorems 2.1 and 2.2. 
We also state without proof the auxiliary Lemma A. 3 needed in the sequel, 
which is a simple extension of Lemma 2.4 of Bartroff [3] in the same manner. 

Recall that ip+(z) = <f>(z) - z${-z) = $(-s) dx. If A; are i.i.d. N(ji,l) 
and S n = Ya=i Xi, then 

£(£„ - x; £„ > x) = ^ P(£ n > y) dy = ^ ■ V> + (^=^) • 

Lemma A.l shows that these two quantities are asymptotically equivalent 
in a certain range even when the Aj are not normal, given that one of (2.3)— 
(2.5) holds. 
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Lemma A.l. Let the Xi be i.i.d. and satisfy one of (2.3)-(2.5). Let a n 
be a sequence such that 



a n -nfi r~ ~ a n - n[i 

Inn = — € {— 00,00) or \ (2 — e)logre> = > 00 

n^oo J n v Jn 



for some e G (0, 1) as n — > 00. T/ien as n — > 00, 



(A.3) P(S n > On) ~ 1 - $ 



n 



(A.4) - a n ; £„ > a n ) ~ ^ • V> + (^7=^) ■ 

Proof. Let T n = (E n — n\i)j\Jn and b n = (a n — n[i)/yjn. Assume that 
b n — > 00; otherwise, (A.3) holds by the central limit theorem. If (2.3) or 
(2.4) holds, then Theorem 4.6 of Hall [10] shows that \P(T n >x)- = 
(9(l/n) uniformly in x. Then 



P{T n >b n ) _ x 



0(l/n) < O(lfn) 



*(-&») ~ $(-V(2-e)logn) 

= o(n-/ 2 Vbg^) = c(l). 



If (2.5) holds, then (A.3) holds by Cramer's theorem (e.g., see Feller [9], 
Theorem XVI.7.1). 

Since E(T, n — a n ; S n > a n ) = sfn f b °° P(T n > x) dx, to establish (A.4) it 
suffices to show that P(T n > x) dx ~ ip + (b n ). First assume that 6 n — ► 

00 such that b n < \/(^ — s) log tx. Choose c n — ► oo such that 6 n + e' < c n < 
a/(2 - e") logn, some e',e" > 0. Then 

(A. 5) ^~^#T<^- £ ' C "-0 

since 4>+(x) ~ ^>(x)/x 2 as x ^ oo. Write /£ = / 6 C ; + /~. By (A.3), 

(A.6) / P{T n >x)dx~ <5>{-x)dx = ^ + {b n )-ilj + {c n )~^ + {b n ) 

Jb„ Jb n 

since ip + (c n ) < (p(c n ) = o(tp + (b n )). For the other term, 

/ P(T n >x)dx = E(T n ; T n > c n ) - c n P(T n > c n ) 

Jen 

by integration by parts and c n P(T n > c n ) ~ c n <&{— c n ) = o(ip + (b n )) by Mills' 
ratio and (A. 5). By Schwarz's inequality, the other piece is 



E(T n ; T n > c n ) < y ET% ■ El{T n > c n } 2 

= yfl ■ P{T n > Cn) ~ y/$(-Cn) = o(^ + (b n )) 
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by an argument like (A. 5). These last two estimates give J c °° P(T n > x) dx = 
o(tp + (b n )), which with (A.6) gives P(T n >x)dx~ ^ + (b n ). 

If b n — > 6 G (—00,00), there is with the same distribution as T n such 
that T' n ^ Z ~ iV(0, 1) a.s. by weak convergence, and hence also in L 1 by 
uniform integrability (e.g., see Durrett [6], Theorems 2.1 and 5.2). Thus, 

/•oo 

/ P{T n >x)dx = E{T' n -b n -T' n >b n )^E{Z-b-Z>b) = ^ + {b). D 

Jb„ 

Lemma A. 2. Let n(x) be a positive integer-valued function and let z(x) = 
(x — (in(x)) I \J n(x) . If n(x) is such that z(x) —> —00 and 



(A.7) \z(x)\ < [yj (2 — e) log n(x) A \fx\ 

for some e G (0, 1) as x — > 00, then S n ( a )( a ) — (N,M) satisfies 

(A.8) EN <a/fi + 0(\z(a)\y/a) 

and EM — > 1 as a — > 00 . 

Proof. Denote n = n(a), n<i = [re 1 / 2 ] and z = z(a). Suppose y > 0. It is 
well known from sequential theory that 

E(M - l\a - S*i = y) < y/{^n 2 ) + O(l) = y/(^) + 0(1) 

as n — > 00 uniformly in y. Thus, 

M-l = M-l;5i<a) 

(A.9) 

< (^-fj) E{a-S 1 ;S 1 <a) + 0(l)P(Si<a) 



and (—a + fj,n)/y/n = \z\ < a/(2 — e) logn, so by Lemma A.l, 

(A.10) £(a - < a) ~ • ~ v^- ^r^- 

z z 

Also, since P(5i < a) = P(^p < z) -► 0, (A.9) becomes £M 1. To 
show that (A.8) holds, write PA" = n + n2 • E(M — 1) = n + o(- v /n). We have 
n = a/n + 0{\z\y/a) by (A.7), so 

PiV = a/fi + 0(\z\y/E) + o(y/a) = a/fi + 0(|z|Va). □ 

Lemma A. 3. IfhE B m and 5 is any sampler such that Rh(S) = 0(h(a)), 
then for any e > and <k < ?re , as a — > 00, 

PCo-^^CI-^CI/m) 1 -^)*^ (a))-l. 
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A.2. Proof of Theorem 3.1. Let a = a -1 logd -1 and {(c, d)} 6 B° m so that 
Um(z*) = m. Let a* = logd -1 + o(l) be that given by Lemma A. 4 below. 
Then 

hkia^a*) ~ tr-WWhtia*) cc Mlogd" 1 + o(l)) - M^gd" 1 ) 

since (d/dx)hk(x) is bounded for large a;; thus, 

(A.ll) /i m (a"V) <d/c«/i m _i(cT - V) 

as a consequence of {(c, d)} G Let <5* = (/V*, M*) denote a Bayes power 
one test. By Lemma A. 4, we know that J2i=i Y% = cr -1 log/7v» > <j -1 a* , so 5* 
is a multistage sampler with boundary er -1 a*. Theorem 2.1 gives 

> cE N* + dE M* 

= c[E (N* - a*/I ) + (d/c)£? M*] + ca*/I 
(A.12) > c[m(d/c) +o(d/c)] + c/ -1 (log d -1 +o(l)) 

= cIq 1 logd" 1 +d-m + o(d) 

= cIq 1 logd" 1 + d ■ u m (z*) + o(d). 
Also by the case of Theorem 2.1, for 5 = (N, M), 
(A.13) E (N - Iq 1 logd" 1 ) + (d/c)E M < m{d/c) + o(d/c). 
Then 

= c[E (N - / -1 logd" 1 ) + (d/c)E M] + c/ -1 logd" 1 

(A.14) < c[m{d/c) +o(d/c)] +c/ " 1 logd" 1 [by (A.13)] 

= cIq 1 logd" 1 + d • m + o(d) 

= cIq 1 logd" 1 + d ■ u m (z*) + o(d), 

so it suffices to show that P\(N < oo) = o(d). The right-hand side of (A.13) 
is 0(d/c), so by Lemma A. 3 (with cr -1 /o in place of /i), 

P (a - S m _! > (l/2)( CT " 1 / )" 1+ ( 1 / 2 ) m - 1 F]™- 1) (a)) - 1 
as c, d — > 0. On the above event C/, 

a - S m _x > (l/2)(a- 1 /)- 1+ ( 1 /2) m - 1 j p(-- 1 ) (a) > ^ m(a) 2 

for some 77 > by Lemma 2.5 of Bartroff [3]. On U, the mth stage of 5 = 
S m,d/d a ) De g ins Dold sampling. Letting p m = [(5 m -5 m _i)-cr" 1 / A^m]/v / ^m ) 



a-S m -i-a 1 I N m h m (a) 



Po(S m >a+ \ h m {a)\U) =P (p m > ' — + 
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if h m (a) <C N m on U, since (a — S m -\ — a~ 1 IoN m ) / y/N m — ► — oo by definition 
of 5^ d / c {a). This holds since 

/a -i r\ ^ a - gm-l r]h m (a) 2 

(A. 15) A m > — — — > — — > ft m (a) 



on [7 Let V = U D {S m > a + ^/i m (a)} so that P (V) -> 1. Using Wald's 
likelihood ratio identity, the relation l n = exp(cr^i Yi), and letting primes 
denote complements, 

Pi (A < oo) = E (l^;N< oo) < ^o^ 1 

= E [exp(-aS m ); V] + E [exp(-aS M ); V] 



<exp(-logd 1 -o- A //i m (a)) + J E [exp(-logd 1 );V] 



= d • exp(-a^//t m (a)) + d • P (F') 
= d-o(l) + d-o(l) = o(d), 

proving that (3.6) holds in the {(c, d)} S ,8^ case. 

Now let {(c, d)} E B m . By using the corresponding B m cases of the results 
used in the arguments leading to (A. 12) and (A. 14), 

P* d > cIq 1 logd- 1 + d • u m (z*) + o(d) > ^(5) - Pi (AT < oo), 

so it again suffices to show that -Pi (A < oo) = o(d). Let J7 be as above 
and Wi = {S m > a + ffl}, W 2 = {S m <a- yjh m {a)}, W 3 = {S m +i > 
a + (hrnia)) 1 / 5 } and W = (U n Wi) U (*7 n W 2 n W 3 ). We will show that 
Pq(W) — > 1 as d — > 0, which will allow us to say that the likelihood ratio 
is large enough at the end of the mth stage (on W\) or at the end of the 
(m + l)st stage (on W 3 ) that Pi (A < oo) = o(d): 



p (u n Wi) = p (Wi\u)p (u) ~ p (Wi | it) 

lh-(n\ 

u 



a-S m -i-a 1 hN m h m (a) 
"o Pm ^ 7== r 



.'Am V ^ 

and (a — 5 m _i — (J~ 1 Iq N m )/ y/N m — > z* on J7 by definition of 5+ 2 * (a) . Then 
P (U n Wi) — >• 1 - $(z*) by the central limit theorem if y/h m (a) < v 7 ^™ 
on [/, which holds by (A. 15). Next, write 

P (c/ n w 2 n w 3 ) = p ([/)Po(w 2 |c/)Po(w 3 |c/n w 2 ) 
~ p (w 2 \u)p (w 3 \u nw 2 ). 

We have P(W / 2 |[7) — ► $(z*) by an argument like that above. Also, 
r> n*r \TT^ixr \ of ^ a - S m - a" 1 1 N m+1 (/i m (a)) 1/5 . 

p (w 3 \u nw 2 ) = p ( pm+i > nn — + — ur\Wi 



m+1 
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which approaches 1 since 



0W>J^ > %^ » (M«)) 1/5 

V cr V<7 i 

and (a — S m — a^ 1 IoN m+ i) / ^ N m+ \ — > — oo on [/ n W2 by definition of 
5^ z * (a). Combining these, we have Po(f H W2 fl W3) — ► <5(z*) and, hence, 

p (vf) = P (^ n Wi) + p (^ n w 2 n w 3 ) -» 1 - *(**) + = l. 

Note that on W, S M -a> y/h m (a) A (/^(a)) 1 / 5 = (/^(a)) 1 / 5 , so 
Pi(iV < 00) = E (l^;N< 00) < Pq^ 1 

= E [exp(-aS M ); W] + £h[exp(-<TSk); W] 

< exp(-log^ 1 - ^(^(a)) 1 / 5 ) + B [exp(-logd- 1 );^ / ] 

= d ■ exp(- C r(/ lm (a)) 1 / 5 ) + d ■ P (W) 

= d-o(l) + d-o(l) = o(d), 
finishing the proof. 

Lemma A. 4. There exists a* = logd -1 + o(l) such that log In* > a*. 

Proof. Suppose that a Bayes procedure has sampled X±, . . . ,X n in m 
stages. By the Bayes property, 5* will stop at this point only if the stopping 
risk is no greater than the continuation risk, that is, only if 

(A.16) l^KpicdJ- 1 ), 

where 

p(u,v,w) = inf {E (uN + vM) + wP 1 (N < 00)}. 

{N,M):N>1 

Multiplication of (A.16) by l n yields 1 < p(l n c,l n d,l); hence, we consider 
the function p(t) = p(tc,td,l) for t > 0, and note that (A.16) implies that 
p{In*) > 1- The function pit) is the infimum of a set of lines, each of slope 
at least c + d by virtue of the restriction on the infimum. Thus, pit) is 
continuous, strictly increasing and satisfies p(t) > t{c + d), so that 

(A. 17) p(t)>l when t > (c + d)' 1 . 

If (N',M') is the procedure that samples with constant stage size one (i.e., 
fully-sequential sampling) and an appropriately chosen boundary, then it 
is well known (e.g., see Lorden [16]) that Pi(N' < 00) < 1 and EqN' = 
E M' < 00, and hence, p(t) < t(c+d)E N' + Pi(N' < 00) < 1 for sufficiently 
small t. This and (A. 17) imply that there is a unique number e a such that 
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p(e a *) = 1. Then logZjv* = log p' 1 (p(l N *)) > logp _1 (l) = a* . To show that 
a* = log cT 1 + o(l), let Yi be as in (3.4) and 5% >h (a) = (N,M), the multistage 

sampler described in Section 2 with /i(a) = a 3 / 4 and a = a" 1 log(d/c). Since 
v / a< < a, by Lemma A.2, E N - a(cr _1 Jo) -1 = o(h(a)) and £ M -> 1. 
Also, Z^ 1 = exp[— cr(Y"i + • • • + Ijv)] < exp[— ca] = c/d, so that 

pit) < E [t{cN + dM) + l^HN < 00}] 

< tc[E (N - a(a- 1 / )~ 1 ) + a(a- 1 / )" 1 + {d/c)E M} + Sq^ 1 

< tc[o(h{a)) + a((j- 1 /o)" 1 + (d/c)(l + o(l))] + c/d 

= te[o(d/c) + d/c(l + o(l))] + c/d = id(l + o(l)) + c/d. 
This implies that p(t) < 1 when t < + o(l)); hence, 

a* =logp^(l) > logp- 1 (p(d- 1 (l + (l)))) = log<T 1 +o(l). 
On the other hand, 

a* = logp~ 1 (l)<logp~ 1 (p([c + d]~ 1 )) [by (A.17)] 
= log(c + d)- 1 = logd- 1 + o(l) 
since d/c— > 00, establishing a* = logd -1 + o(l). □ 

A.3. Proof of Theorems 4.1 and 4.2. 

Lemma A. 5. Assume {(c,d)} G S m and Ze£ z* G [—00,00) 6e the unique 
solution of 

<P{z*) _ K m (q-q 1 Iq ) h m (q-q 1 log d" 1 ) 
1 - $(z*) ~ c,d^o d/c 

Then 

(A. 18) aEo^* + dE M* + P^D* = 0) > c^ 1 log(T 1 + u m (z*)d - o(d) 
as c,d^ 0. 

Proof. We extend 5* to a power one test of /o versus f\ on the event 
{£)* = 1}. Let N = M = inf{n > 1 :l n > d~ 2 } be fully-sequential sampling 
with likelihood ratio boundary d~ 2 . Define N' = N* + N ■ 1{D* = 1} and 
M' = M* + M ■ 1{D* = 1}, the power one test that coincides with 5* on 
{D* = 0} but continues with the power one test (iV, M) on {D* = 1}. Since 
{N' < 00} = {D* = 0} U {D* = 1, N < 00}, we have 

cE N* + dE M* + Pi(D* = 0) 

= c[E N' - E (N- D* = 1)] + d[E M' - E (M; D* = 1)] 
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+ Pi (N' < oo) - Pi(D* = 1, N < oo) 
= [cPo^' + dP M' + Pi(jV' < oo)] 

- [cEb(2V; P>* = 1) + dP (M; D* = 1) + P X {D* = 1,N < oo)] 

= Pi — R.2 . 

By Theorem 3.1, Pi > c/" 1 logfP 1 + u m {z*)d + o(d), so to show that (A. 18) 
holds, it suffices to show that P2 = o(d). Write 

P 2 < [cE (N\D* = 1) + dE (M\D* = 1)]P (D* = 1) 

(A.19) 

+ Pi(N <oo\D* = l). 
It is well known that 

E {N\D* = 1) = E (M\D* = 1) = /Q-Mogd- 2 + 0(1) = 0(\ogd~ l ). 
We will show below that there is a K < 00 such that 
(A.20) l N *<Kd on{P»* = l}. 

Using this and Wald's likelihood identity, 

P (D* = 1) = Ei(l N * ;D* = 1, N* < 00) < Kd = 0(d). 
Combining these two estimates gives 

[cE (N\D* = 1) + dE (M\D* = 1)]P (D* = 1) 
(A.21) = [c-0(log^ 1 ) + d-0(log^ 1 )]0(d) 

= 0(d 2 \ogd~ 1 ). 
By definition of (N, M), 

Pi (AT < oo\D* = 1) = Po^ 1 !!^ < oo}|iV > 0) 

< E (d 2 l{N < oo}\N > 0) < d 2 . 

Plugging this and (A.21) into (A.19) gives P 2 < 0{d 2 logd~ 1 ) + d 2 = o(d). 

To verify (A.20), write the posterior risk of rejecting Hi after the kih 
stage as 

(A.22) r ofc = — : ■ , r lk - 



TlolN*k + TTl TT()l N ,k + TTi 

and let = rofc A rife, the stopping risk after the fcth stage. A Bayes test 
stops sampling if the stopping risk is less than all possible continuation 
risks. One possible continuation is fully-sequential sampling. By Lemma 2 
of Lorden [14] there is a constant K* < 00 such that a Bayes procedure 
can only stop when the continuation risk of fully-sequential sampling is less 
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than K* times the cost per observation, c + d in this case. Thus, tm* < 
K*{c + d) < 2K*d, meaning r 0M * < 2K*d or r 1M * < 2K*d. If r 0M * < 2K*d, 
then by the first relation in (A. 22) and some simple algebra, 

7Ti • 2K*d AttiK* , 

lN * - —( OV*A\ ~ d 

-kq{wq - 2K*d) Ti wo 

for small enough d. Clearly, tqm* < ?Tm* in this case, so we can be sure 
D* = 1. Otherwise, r 1M * < 2K*d < r M* for small d, so D* = 0. □ 

Proof of Theorem 4.1. Let I = Iq = h and a = gq = g\. Rearranging 
terms, 

l 

(A.23) r* 4 = m-iWi-iiaEiN* + d l E l M* + P^D* = i)}, 
i=0 

where Cj = C7rj/(7ri_,tt;i__j) and dj = d7Tj / \it\-iW\-i) . It is simple to verify 
that {(ci,di)} E B m and 

lim K ™( cr_1/ ) /l ™( fJ " llo g^~ 1 ) _ <K Z *) 



c.d^o dj/cj 1 — <&(z*)' 

By Lemma A. 5, 

dE.N* + d^M* + Pi_i(D* =%)> cj- 1 logd" 1 + + o(di), 

and plugging this into (A.23) gives 
l 

r* jd > ^2 ir^iWx-ilcil" 1 log d' 1 +u m (z*)di + o(di)] 
l 

= J2 7 r i [cI~ 1 logd^ 1 + u m (z*)d + o(d)] 

i=0 

= c/" 1 log d" 1 + u m (z*)d + o(d), 

establishing (4.1). For an event A, denote 
i 

(A.24) r Cjd (S- A) = J2*i [cE^N; A) + dE t {M; A) + Wi Pi{D = 1 - i,A)]. 

8=0 

Obviously r Cjd (<5; A) + r^ d {5; A 1 ) = r cd {$). Let 



A = { | log I 1 - IN! | < a^JVilogiVi} , 

-4-1 = 11 — lo g ;1 " (-^Vi)l < T^JW^gNl}, 
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where j3 = 2 — (l/2) m_1 . The following bounds are proved below: 

(A.25) cE (N;A ) < cE N {0) + o(d), 

(A.26) dE {M;A ) < dE M i0) +o(d), 

(A.27) P (D = l,A )=o(d), 

(A.28) cE 1 (N;A ) = o(d), 

(A.29) dE 1 (M;A )=o(d), 

(A.30) P 1 (D = 0,A )<P 1 (N^ <oo) + o(d). 



Using these bounds, 
1 

= Y,ir i [cE i (N;A ) + dE i (M;A ) + w i P i (D = l-i,A )} 

i=0 

< tt [cE N^ + dE M i0) + o(d)] + tt^P^N^ < oo) + o(d)] 
[c E N( 0) + doEoM^ + Pi(A^ (0) < oo)] + o{d) 

= t:\Wi\cqI~ 1 logc^Q 1 + u m (z*)do + o(do)] + o(d) (by Theorem 3.1) 

= ir [cr l log cT 1 + u m (z*)d] + o(d) 
and the same argument with the indices reversed yields 
(A.31) r c4 (5;A 1 ) < n^cT 1 log cT 1 + u m {z*)d] + o(d). 

Now we consider r C; d(5; A' n A^). Let A = A' (~) A'i. The bounds 
(A.32) c£ (iV; A) = o(d), 

(A.33) d£; (M;A) =o(d), 

(A.34) P (D = l,A)=o(d), 

are also proved below. These bounds give r Cjd (5;A) = o(d). Combining this 
with (A.31) gives 

r c ,d($) = r Ctd (S; A ) + r Cjd (8; A{) + r Cjd (S;A) 
l 

< ndcr 1 log cT 1 + u m (z*)d] + o(d) 

= c/ _1 log cT 1 +d-u m (z*) +o(d), 

establishing (4.2). All that remains is to verify the bounds (A.25)-(A.30) 
and (A.32)-(A.34). 
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Let Yj = Yj ^ and n{x,z) be as in Section 2 with a~ l I in place of fi. We 
begin by proving the crude bound 

(A.35) Ei(N\U) = CKlogfT 1 ) for any U such that Ei{M\U) = 0(1), 

i = 0, 1. If {(c, a!)} € £>£j, the mth stage begins bold sampling, in which each 
stage is bounded by max xg { 2 , +)a ,_} n(x, — [\Jd/c/x 1 l A A \J (3/2) log(x + 1)]), 
where :r + = a -1 logd -1 — J2Yj an d = ~ (~ a ~ l logd -1 ). If S does 
not stop at the end of a stage, then it must be that IXI^jl < c^logd -1 ; 
hence, x + and x_ are both bounded above by 2<7 _1 logd _1 . Then the sizes 
of stages m, m + 1, . . . are all bounded by 

n(x, -[Jdfc/x 1 ^ A {l/2)^x])\ x=2a - Hogd -, = O(logd^). 
The sizes of the first m — 1 stages are likewise bounded by 

max n(x, J (1 - 2 k ) log(x + 1)) < n(x, J (1/2) log(x + l))| x=2ff -i iogd-i 

= 0(logd- 1 ) 

for some k > 1. Thus, the size of each stage of 5 is uniformly 0(logd _1 ) 
and therefore, Ei(N\U) < Oflogd-^^Ml*/) =0(logd~ 1 ). This holds if 
{(c, d)} E £>+ as well since then the (m + l)st stage begins bold sampling. 
Next let B = {log/ fc > -logd" 1 for all 1 < k < M} and note that 5 and 
(i\T(°),M( 0) ) coincide on A n B since log/ 1 > INi - a' 1 VJM[EgN~ 1 > 
for small d on Aq and log/ fe never crosses the lower boundary — logd" 1 on 
B. Clearly, E (M\A n B') = 0(1), so using this crude bound and Wald's 
likelihood identity, 

P (A) n B') < P (B') = E 1 (l M ;B') < E^B') < d 

and E (N;A n B) < E N^ since 5 and (N( \M^) coincide on A n B, 
so that 

cE (N;A ) = cE (N;A nB) + cE (N;A D B') 

<cE N^ + c-0(d log d" 1 ) 

= c£ iV {0) + o(c) = aE N (0) + o(d), 

which proves (A.25). Similarly, £ (M;A nB)< E M^ and E (M|A) H 
B') = 0(1), so that 

dE (M; Aq) < dE (M; A nB) + dE (M\A n S')^o(A) n B') 
< dE M^ + d • O(l) • d = dE M^ + o(d), 
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proving (A.26). Letting 7(d) = IN X - o^fiNi log N{ , 

P (D = l,A ) < P (D = 1\A ) = P (l M < -logd" 1 ) log/ 1 > 7 (d)) 

<exp[-(logd- 1 + 7 ((i))]=o((i), 

proving (A. 27), and a similar argument proves (A. 34). Since 7 ~ INi ~ 
logd -1 , we have 

Pi (A )=E (li X ; log h> 7(d)) <So(e- 7(<i) ; log h > 7(d)) 
<e-^ <exp[-(l/2)logd- 1 ] = v / d. 
Also, E 1 (N\A ) = O{logd- 1 ) by (A.35) so 

cE x {N; Ao) = cE^N^P^Aq) < c\Td ■ O^ogdT 1 ) = c • oil) = o(d), 

proving (A.28). (A.29) holds since Pi(M|A ) = O(l) and Pi(A)) -> and 
similarly for (A.33). Since 5 and (N^,M (0 ^) coincide on A f]B, 

Pi(D = Q,A r\B) = Pi(AT (0) <oo,A nB)< Pi(A^ 0) < 00). 



cE (N; A) = cE (N\A)P (A) = c • OOogcT 1 ) • oOlogd" 1 )"^ 2 ) 



Also 



P 1 (D = 0,A nB')=E [(l M y 1 ;D = 0,A nB / } 
<E [d;D = 0,A PiB'} 
<dP (B') = o(d) 
since clearly Pq(B') — > 0. Combining these two gives 

Pip = 0; A ) =Pi{D = 0; A n B) + Pi(P = 0; A n 5') 

<Pi(JV®< 00) + o(d), 

proving (A. 30). Now 




) 




(A.36) 



o(d)- 



(log^ 1 ) 1 ^/ 2 
d/c 



o(d)- 



(logd- 1 )^/ 2 )' 
/i m (log d" 1 ) 



o(d)- 



(logd- 1 )^/ 2 ) 

(logd^ 1 )( 1 /2) 



O(d) 
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proving (A. 32) and finishing the proof. □ 



Proof of Theorem 4.2. Let T = {t > 0: | logt-I iVi| < a \^W^gNl}, 
where/3 = 2-(l/2) m - 1 and A = {1 1 GT}. Let 8 = (N, M, D) and 5 {l l c, fd, 
z*) = (i\K°),M(°)). We will use the r c4 (5;A) notation as in (A.24). Clearly, 
I 1 > 1 on Aq for sufficiently small c, d, so that 5 will switch and continue 
sampling according to 5q(J}c, l x d, Zq) after the first stage; hence, 

(A.37) iV<iV (0) +iVi and M<M (0) + 1. 

Also note that 

(A.38) {D = 0}r\A C{N {0) <oo}, 

since on {D = 0}n Aq the likelihood ratio will cross the boundary d~ l , which 
is equivalent to the stopping rule of <5o(/ 1 c, V~d, Zq), as discussed above. Using 
the bounds (A.27)-(A.29), 

r c ,d{8\ A ) = itqcEq{N- Aq) + TT dE(M; A ) 

+ kxWiPi(D = Q,Aq) +o(d) 

= Eq[itqcN + irodM + tt iWi {I m )- 1 ■ 1{D = 0}; Aq] + o(d) 

(A.39) < Eq [ttqcN^ + n dM^ 

+ 7T 1 w 1 (l 1 )~ 1 (l M /l 1 )- 1 ■ 1{^ (0) < oo};A ] 
+ TrocNt + 7T d + o{d) [by (A.37) and (A.38)] 
= Eo[(p(l 1 );l 1 ' G T] + itqcNi + TtQd + o(d), 

where 

<p(t) = c^oko(ciV (0) + dM^l 1 =t] + mwtt^P^N^ < ooj/ 1 = t). 
By rearranging terms, 

<p(t) = Trot-^EoKt^N^ + (td)M^ I/ 1 = t] + Pi(iVW < oo]/ 1 = t)} 
(A.40) + (7r lWl - 7r )t _1 Pi(iV (0) < oo]/ 1 = t) 

= Trot-iRtcjdiSitcMzo)) + (ti^i - Trojr^CiV^ < oo)/ 1 = i). 
For any t G T, log^cTp 1 ~ (1 — Iq/I\) logd -1 , which implies that 

hmQogitd)- 1 ) ~ V.((l " /o/ZOlog^ 1 ) ~ (1 - /o//l) (1/2r /i m (logd- 1 ), 
and hence, {(tc, id)} G £> m uniformly for t G T. Moreover, by this last, 

(A 411 lim ^(^Q^M^o^ogN)" 1 ) _ H z o) 

1 ' J cd^o (td)/(tc) 1-$(jb5)' 
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By Theorem 3.1, 

Rtc,td( tc i td ' z o) < te/ " 1 log(td)" 1 + u m (zo)td + o{td) 

and the proof of Theorem 3.1 shows that P±(N^ < oolZ 1 = t) = o(td) uni- 
formly for t&T. Plugging these last two into (A. 40), 

<p(t) < n t~ 1 [tclQ 1 log(td)~ 1 + u m (z* Q )td + o{td)\ + (k\Wi - ir^t^o^d) 
= ttqIcIq 1 logcT 1 + u m (zl)d] - ttqcIq 1 logt + o(d), 
uniformly on T, and, in turn, plugging this into (A. 39) gives 
r c ,d(^ A o) < McIq 1 + (1 + u m {z* Q ))d] 

(A.42) 

+ TrocIo-VoiVi - ^(logZ 1 ; I 1 e T)] + o(d). 

By repeating the argument leading to (A. 36), we have EoQogl ;Aq) = o(d/c); 
hence, (A.42) becomes 

(A.43) r Ctd (5;A ) < ir [cl l logd' 1 + (1 + u m (z* ))d] + o{d). 

Letting A\ = {| log^// 1 ) — I±Ni\ < o\ \J (3Ni log N[ } and repeating arguments 
in the proof of Theorem 4.1 give 

r c ,d{&; Ai) < vrifc/f 1 logd" 1 + d ■ u m (Qi, C J 1 - 1 Ii)] + o(d) 

and ^((Jj^ni'i) = o(d). Combining with (A.43) gives (4.8) with a "<." 

Next we show that (4.7) holds with a ">." Let l* k = l N , k , T* = {t > : 
| log t - J JVf| < a ^/(3N*logN*} and A* = {I* 1 G T*}. Let 

r* = TTiicE.N* + dEiM*) + ^w^P^D* =i), i = 0, 1. 

Since 5* follows its first stage with the optimal continuation (N* ,M* ,D*), 
we can write 

r* = E [ir (cN* + dM*) + inw^r^yHiD* = 0}] 

(A.44) = E [ir (cE N* + dE Q M*) + ttiWiCP^PiGD* = 0)] 

+ 7r (dVj +d). 

Define </?*(*) = ^w^ 1 {E [c(t) N* + A>T* = i] + = Oir 1 = t)}, 

where c(t) = ctiro / (ttiWx) and d(t) = dtitQ / {tt\Wi) . It will be shown below 
that iV* ~ I£~ logd -1 . Assuming this holds, the arguments leading to (A. 41) 
show that it holds with (tc,td) replaced by (c(t),d(t)). Then by Lemma A. 5, 

<p*(t) > n^r 1 [c(t) I Q 1 log d^ 1 + u m (z*)d(t) +o(d(t))] 

(A.45) 

= 7r [c/ logd + u m {zl)d - ttqcIq \ogt + o(d) 
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uniformly for i £ T* , and hence, 

rS = £b[^*(r 1 )]+7ro(c^+d) 

> Eoitp^F^A^+TToicN* + d) (since if* > 0) 

(A.46) > 7r [c/ -1 logd -1 + u m (z* )d]P (A* ) - Trca^oNsT 1 ;^] 

+ 7r (dVJ + d) + o(d) [by (A.45)] 

>7r [c/ -1 logd -1 + (l + ))d]+ (d), 

this last by the arguments leading to (A. 36). A straightforward application 
of Lemma A. 5 gives r\ > 7Ti[cXf logd -1 +u m (z*)d] +o(d), and adding these 
last two gives (4.7). 

All that remains is to verify that Nf ~ 7-j" 1 logd -1 . Suppose instead that 

N* 

(A.47) L = liminf- l — < I7 1 . 

~ c,d^o logd -1 

Then there is a sequence {(c, d)} approaching (0,0) on which the liminf is 
achieved, and by repeating the above arguments on this sequence, 

(A.48) r* > ttoIcIq 1 logd -1 + (1 + u m (z' ))d] + o(d), 

where z' is the unique solution of 

^( z o) _ i^(o'o 1 Io)h m (aQ 1 (l-IoL)logd~ 1 ) 



1 - $(z' Q ) c,d->o d/c 
By writing 

/ 1 — I L \ (i/ 2 )" 1 

h m (a^(l-I L)\ogd~ l ) = ^ _ j xM^o^l-V^logd- 1 ), 
we have 

0(4) _fl-Jo4V 1/2r 



x lim 



n m (a 1 Io)h m (a 1 (l - 7 /Ji)logd x ) 



c,<2^0 d/c 

l-Zq^ N (1/2)m x 0(go) > 0(4) 
1-k/hJ l-$(4) - l-$(4)' 

Hence, 4 > Zq since z i— ► 4>(z)/[l — $(z)] is increasing, so (A.48) becomes 

(A.49) r* > Mdo 1 logd -1 + (1 + n m (4))d] + o(d), 

since u m is strictly increasing. By reversing indices and repeating this ar- 
gument, conditioning on {| log(l/r 1 ) - I\N*\ < <j\ ^/JJNfTogNj } instead of 
Aq, we obtain 

(A.50) rj > 7Ti[c/ilogd -1 + {1 + u m {z[))d] + o(d). 
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Using (A. 49), (A. 50) and (4.7), we would then have 

r *,d ~ r c,d( s ) = r o + r l- r c,d( 6 ) 

> 7Ti<2[l + u m (zi) - u m (zl)] + o{d) 
>ed + o(d) >0 

for some e > and sufficiently small c,d since m < u m < m + 1. This obvi- 
ously contradicts r* d < r Cj d(5) so (A. 47) cannot hold. On the other hand, 
if 

N* 

(A.51) 77 = limsup- \- r - Jf 1 > 0, 

then again on a sequence {(c, d)} approaching (0,0), we would have 
r *,d ~ r c,d(S) > ^odo 1 log cT 1 + mcN* - r c4 (5) (by Lemma A. 5) 

> TTocIq 1 log (T 1 + 7T1C(?7 + If X ) logrf-^l + o(l)) 

- [(tto/Jo +vr 1 // 1 )clog^ 1 + 0(d)] [by (A.51) and (4.8)] 
= 7ri(7/ + o(l))-clogd- 1 + 0(d) 

> 7ri(r?/2) -c log d" 1 +o(c log cT 1 ) > 

for sufficiently small c,d, again a contradiction. Thus, (A.51) cannot hold 
either, showing that iV* ~ I^ 1 logtf -1 and completing the proof. □ 
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